MeLos: Analysis and Modelling of Speech Prosody and Speaking Style
نویسنده
چکیده
This thesis addresses the issue of modelling speech prosody for speech synthesis, and presents MeLos: a complete system for the analysis and modelling of speech prosody “the music of speech”. Research into the analysis and modelling of speech prosody has increased dramatically in recent decades, and speech prosody has emerged as a crucial concern for speech synthesis. The issue of speech prosody modelling is to model speech prosody variations depending on the context linguistic (e.g. linguistic structure), para-linguistic (e.g., emotion), or extra-linguistic (e.g., socio-geographical origins, situation of a communication). Modelling the variability of speech prosody is required to provide natural, expressive, and varied speech in many applications of high-quality speech synthesis such as multi-media (avatars, video games, story telling, dialogue systems) and artistic (cinema, theatre, music, digital arts) applications. The objective of the present study on the analysis and the modelling of speech prosody is to vary and adapt the strategy, alternatives, and speaking style of a speaker for natural, expressive, and varied speech synthesis. The objective of this thesis is to model strategies, alternatives, and speaking style of a speaker for natural, expressive, and varied speech synthesis. The present study presents original contributions that correspond to a special attention paid to the combination of theoretical linguistics and statistical modelling to provide a complete speech prosody system that can be used for speech synthesis. In particular, speech prosody characteristics are described in three linguistic levels from signal variations to abstract representations. A unified discrete/continuous context-dependent HMM is presented to model the symbolic and the acoustic characteristics of speech prosody. A rich description of the text characteristics based on a linguistic processing chain that includes surface and deep syntactic parsing is proposed to refine the modelling of the speech prosody in context. Segmental HMMs and Dempster-Shafer fusion are used to balance linguistic and metric constrains in the production of a pause. A context-dependent HMM is proposed to model the f0 variations based on the stylization and the trajectory modelling of short and long-term variations simultaneously over various temporal domains. The proposed system is used to model strategies and alternatives of a speaker, and is extended to the modelling of speaking style shared among speakers using shared-context-dependent modelling and speaker normalization techniques.
منابع مشابه
Prosodic analysis of storytelling discourse modes and narrative situations oriented to text-to-speech synthesis
The generation of synthetic speech with a certain degree of expressiveness has been successful for some particular applications or speaking styles (e.g. emotions). In this context, there is a particular speaking style with subtle speech nuances that may be of great interest for delivering expressive speech: the storytelling style. The purpose of this paper is to define a first step towards deve...
متن کاملProsody control for speaking and singing styles
By proper control of prosody, text-to-speech systems already have the capability to imitate distinctive speaking styles. We show two examples where we can capture the critical features: the singing style of Dinah Shore and the speaking style of Martin Luther King Jr. The styles are described by Stem-ML tags (soft template mark-up language), which offers the flexibility needed to control accent ...
متن کاملThe prosody of the TV news speaking style in Brazilian Portuguese
This study characterizes the prosodic structure of the TV news speaking style in Brazil and compares it to the speech of interview subjects on a television talk show. Fifteen distinct metrics, designed to characterize both temporal and melodic characteristics of speech, were evaluated on the two speaking styles. The results of the analysis show that the TV news speaking style is characterized b...
متن کاملThe Prosody of Excitement in Horse Race Commentaries
This study investigates examples of horse race commentaries and compares the acoustic properties with an auditorily based description of the typical suspense pattern from calm to very excited at the finish and relaxation after the finish. With the exception of tempo, the auditory impressions were basically confirmed. The examination shows further that the results of the investigated prosodic pa...
متن کاملA Model for Varying Speaking Style in TTS systems
This paper aims to enhance the performance of a TTS system by generating various speaking styles. First we describe three speaking styles (Radio News, Political Address and Conversation) and compare the prosodic features found in these authentic styles with the prosody in “neutral” speech uttered by the eLite TTS system ([1]). Differences concern about 20 prosodic characteristics (F0 span, spee...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2011